Demonstration of simple problematic distributions and how to interpret the diagnostics.

Load packages

library("rprojroot")
root<-has_file(".Workflow-Examples-root")$make_fix_file()
library(cmdstanr) 
library(posterior)
options(pillar.neg = FALSE, pillar.subtle=FALSE, pillar.sigfig=2)
library(lemon)
library(tidyr) 
library(dplyr) 
library(ggplot2)
library(bayesplot)
theme_set(bayesplot::theme_default(base_family = "sans", base_size=16))
set1 <- RColorBrewer::brewer.pal(7, "Set1")
SEED <- 48927 # set random seed for reproducibility

Improper posterior

Unbounded likelihood without proper prior leading to improper posterior

Data

Univariate continous x, binary y, and the two classes are completely separable, which leads to unbounded likelihood.

set.seed(SEED+4)
M=1;
N=10;
x=matrix(sort(rnorm(N)),ncol=M)
y=rep(c(0,1), each=N/2)
data_logit <-list(M = M, N = N, x = x, y = y)
ggplot() +
  geom_point(aes(x, y), data = data.frame(data_logit), size = 3)+
  scale_y_continuous(breaks=c(0,1))

Model

A simple Bernoulli regression (where we have forgotten to include priors)

code_logit <- root("problems", "logit_glm.stan")
writeLines(readLines(code_logit))
// logistic regression
data {
  int<lower=0> N;
  int<lower=0> M;
  int<lower=0,upper=1> y[N];
  matrix[N,M] x;
}
parameters {
  real alpha;
  vector[M] beta;
}
model {
  y ~ bernoulli_logit_glm(x, alpha, beta);
}

Sample

mod_logit <- cmdstan_model(stan_file = code_logit)
fit_logit <- mod_logit$sample(data = data_logit, seed = SEED, refresh = 0)
Running MCMC with 4 sequential chains...

Chain 1 finished in 1.0 seconds.
Chain 2 finished in 1.1 seconds.
Chain 3 finished in 1.1 seconds.
Chain 4 finished in 1.2 seconds.

All 4 chains finished successfully.
Mean chain execution time: 1.1 seconds.
Total execution time: 4.8 seconds.

Convergence diagnostics

There are convergence issues reported by sampling. We can also explicitly call CmdStan inference diagnostics:

fit_logit$cmdstan_diagnose()
Processing csv files: /tmp/RtmpoIWO9X/logit_glm-202110211821-1-56d486.csv, /tmp/RtmpoIWO9X/logit_glm-202110211821-2-56d486.csv, /tmp/RtmpoIWO9X/logit_glm-202110211821-3-56d486.csv, /tmp/RtmpoIWO9X/logit_glm-202110211821-4-56d486.csv

Checking sampler transitions treedepth.
2340 of 4000 (58%) transitions hit the maximum treedepth limit of 10, or 2^10 leapfrog steps.
Trajectories that are prematurely terminated due to this limit will result in slow exploration.
For optimal performance, increase this limit.

Checking sampler transitions for divergences.
1660 of 4000 (42%) transitions ended with a divergence.
These divergent transitions indicate that HMC is not fully able to explore the posterior distribution.
Try increasing adapt delta closer to 1.
If this doesn't remove all divergences, try to reparameterize the model.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory for all transitions.

The following parameters had fewer than 0.001 effective draws per transition:
  alpha, beta[1]
Such low values indicate that the effective sample size estimators may be biased high and actual performance may be substantially lower than quoted.

The following parameters had split R-hat greater than 1.1:
  alpha, beta[1]
Such high values indicate incomplete mixing and biasedestimation.
You should consider regularizating your model with additional prior information or a more effective parameterization.

Processing complete.

We check \(\widehat{R}\) end ESS values

draws <- as_draws_rvars(fit_logit$draws())
summarize_draws(draws)
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
lp__ 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 NA NA NA
alpha 1.438586e+29 1.030470e+26 2.902607e+29 1.527775e+26 5.652304e+20 8.341800e+29 2.95 5 NA
beta 3.329284e+29 2.603185e+26 6.752337e+29 3.859482e+26 1.401509e+21 1.946232e+30 2.99 5 NA

Plot

mcmc_pairs(as_draws_array(draws), pars=c("alpha","beta"))

Quite clear case

A fixed model with proper priors

A simple Bernoulli regression with proper prior

code_logit2 <- root("problems", "logit_glm2.stan")
writeLines(readLines(code_logit2))
// logistic regression
data {
  int<lower=0> N;
  int<lower=0> M;
  int<lower=0,upper=1> y[N];
  matrix[N,M] x;
}
parameters {
  real alpha;
  vector[M] beta;
}
model {
  alpha ~ normal(0,10);
  beta ~ normal(0,10);
  y ~ bernoulli_logit_glm(x, alpha, beta);
}

Sample

mod_logit2 <- cmdstan_model(stan_file = code_logit2)
fit_logit2 <- mod_logit2$sample(data = data_logit, seed = SEED, refresh = 0)
Running MCMC with 4 sequential chains...

Chain 1 finished in 0.0 seconds.
Chain 2 finished in 0.0 seconds.
Chain 3 finished in 0.0 seconds.
Chain 4 finished in 0.0 seconds.

All 4 chains finished successfully.
Mean chain execution time: 0.0 seconds.
Total execution time: 0.5 seconds.

Convergence diagnostics

There were no convergence issues reported by sampling. We can also explicitly call CmdStan inference diagnostics:

fit_logit2$cmdstan_diagnose()
Processing csv files: /tmp/RtmpoIWO9X/logit_glm2-202110211821-1-184bdd.csv, /tmp/RtmpoIWO9X/logit_glm2-202110211821-2-184bdd.csv, /tmp/RtmpoIWO9X/logit_glm2-202110211821-3-184bdd.csv, /tmp/RtmpoIWO9X/logit_glm2-202110211821-4-184bdd.csv

Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.

Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory for all transitions.

Effective sample size satisfactory.

Split R-hat values satisfactory all parameters.

Processing complete, no problems detected.

We check \(\widehat{R}\) end ESS values, which in this case all look good.

draws <- as_draws_rvars(fit_logit2$draws())
summarize_draws(draws)
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
lp__ -2.64 -2.32 1.07 0.78 -4.86 -1.60 1.01 831 1171
alpha 6.03 5.82 2.84 2.88 1.81 11.01 1.01 753 1018
beta 14.36 14.01 6.03 6.22 5.47 24.78 1.01 705 880

Plot

mcmc_pairs(as_draws_array(draws), pars=c("alpha","beta"))

No problems

A model with unused parameter

A simple Bernoulli regression with proper prior (but we have forgotten to remove unused parameter declaration)

code_logit3 <- root("problems", "logit_glm3.stan")
writeLines(readLines(code_logit3))
// logistic regression
data {
  int<lower=0> N;
  int<lower=0> M;
  int<lower=0,upper=1> y[N];
  matrix[N,M] x;
}
parameters {
  real alpha;
  vector[M] beta;
  real gamma;
}
model {
  alpha ~ normal(0,1);
  beta ~ normal(0,1);
  y ~ bernoulli_logit_glm(x, alpha, beta);
}

Sample

mod_logit3 <- cmdstan_model(stan_file = code_logit3)
fit_logit3 <- mod_logit3$sample(data = data_logit, seed = SEED, refresh = 0)
Running MCMC with 4 sequential chains...

Chain 1 finished in 1.5 seconds.
Chain 2 finished in 1.5 seconds.
Chain 3 finished in 1.5 seconds.
Chain 4 finished in 1.5 seconds.

All 4 chains finished successfully.
Mean chain execution time: 1.5 seconds.
Total execution time: 6.2 seconds.

Convergence diagnostics

There are convergence issues reported by sampling. We can also explicitly call CmdStan inference diagnostics:

fit_logit3$cmdstan_diagnose()
Processing csv files: /tmp/RtmpoIWO9X/logit_glm3-202110211822-1-7d17f1.csv, /tmp/RtmpoIWO9X/logit_glm3-202110211822-2-7d17f1.csv, /tmp/RtmpoIWO9X/logit_glm3-202110211822-3-7d17f1.csv, /tmp/RtmpoIWO9X/logit_glm3-202110211822-4-7d17f1.csv

Checking sampler transitions treedepth.
1663 of 4000 (42%) transitions hit the maximum treedepth limit of 10, or 2^10 leapfrog steps.
Trajectories that are prematurely terminated due to this limit will result in slow exploration.
For optimal performance, increase this limit.

Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory for all transitions.

The following parameters had fewer than 0.001 effective draws per transition:
  gamma
Such low values indicate that the effective sample size estimators may be biased high and actual performance may be substantially lower than quoted.

The following parameters had split R-hat greater than 1.1:
  gamma
Such high values indicate incomplete mixing and biasedestimation.
You should consider regularizating your model with additional prior information or a more effective parameterization.

Processing complete.

We check \(\widehat{R}\) end ESS values

draws <- as_draws_rvars(fit_logit3$draws())
summarize_draws(draws)
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
lp__ -6.630000e+00 -6.310000e+00 1.050000e+00 7.600000e-01 -8.66000e+00 -5.640000e+00 1.00 1769 2207
alpha 3.300000e-01 3.300000e-01 6.100000e-01 5.800000e-01 -6.70000e-01 1.370000e+00 1.00 3339 2920
beta 1.250000e+00 1.220000e+00 7.700000e-01 7.600000e-01 -1.00000e-02 2.570000e+00 1.00 3429 2586
gamma -5.814145e+19 -7.507785e+18 1.291367e+20 7.812153e+19 -3.31251e+20 8.772598e+19 3.38 4 NA

Plots

mcmc_pairs(as_draws_array(draws), pars=c("alpha","beta","gamma"))

A case where trace plot is actually useful

mcmc_trace(as_draws_array(draws), pars=c("gamma"))

A posterior with two parameters competing

Data

We add another column to the previous data matrix. Sometimes data matrix is augmented with a column of 1’s, to present the intercept effect, but in this case that is redundant as our model has explicit intercept term alpha.

M=2;
N=1000;
x=matrix(c(rep(1,N),sort(rnorm(N))),ncol=M)
y=((x[,1]+rnorm(N)/2)>0)+0
data_logit4 <-list(M = M, N = N, x = x, y = y)

We use the previous Bernoulli regression model with proper priors.

code_logit2 <- root("problems", "logit_glm2.stan")
writeLines(readLines(code_logit2))
// logistic regression
data {
  int<lower=0> N;
  int<lower=0> M;
  int<lower=0,upper=1> y[N];
  matrix[N,M] x;
}
parameters {
  real alpha;
  vector[M] beta;
}
model {
  alpha ~ normal(0,10);
  beta ~ normal(0,10);
  y ~ bernoulli_logit_glm(x, alpha, beta);
}

Sample

mod_logit4 <- cmdstan_model(stan_file = code_logit2)
fit_logit4 <- mod_logit4$sample(data = data_logit4, seed = SEED, refresh = 0)
Running MCMC with 4 sequential chains...

Chain 1 finished in 3.8 seconds.
Chain 2 finished in 4.3 seconds.
Chain 3 finished in 3.9 seconds.
Chain 4 finished in 3.8 seconds.

All 4 chains finished successfully.
Mean chain execution time: 4.0 seconds.
Total execution time: 16.0 seconds.

The computation time per chain with the original x with just one column was less than 0.1s per chain. Now the computation time per chain is several seconds, which is suspicious.

Convergence diagnostics

There were no convergence issues reported by sampling. We can also explicitly call CmdStan inference diagnostics:

fit_logit4$cmdstan_diagnose()
Processing csv files: /tmp/RtmpoIWO9X/logit_glm2-202110211822-1-406cb8.csv, /tmp/RtmpoIWO9X/logit_glm2-202110211822-2-406cb8.csv, /tmp/RtmpoIWO9X/logit_glm2-202110211822-3-406cb8.csv, /tmp/RtmpoIWO9X/logit_glm2-202110211822-4-406cb8.csv

Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.

Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory for all transitions.

Effective sample size satisfactory.

Split R-hat values satisfactory all parameters.

Processing complete, no problems detected.

We check \(\widehat{R}\) end ESS values, which in this case are ok, but ESS’s are lower than what we would expect from Stan for such a lower dimensional problem.

draws <- as_draws_rvars(fit_logit4$draws())
summarize_draws(draws)
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
lp__ -82.96 -82.64 1.20 0.99 -85.29 -81.64 1 1446 1925
alpha 1.95 1.91 7.08 7.11 -9.72 13.50 1 1090 1172
beta[1] 2.26 2.33 7.08 7.08 -9.34 13.84 1 1089 1157
beta[2] -0.28 -0.27 0.25 0.25 -0.69 0.12 1 1561 1493

Plots

mcmc_pairs(as_draws_array(draws), pars=c("alpha","beta[1]","beta[2]"))

And there it is: alpha and beta[1] are super-correlated. We can compute the correlation.

cor(as_draws_matrix(draws)[,c("alpha","beta[1]")])[1,2]
[1] -0.9993102

The correlation close to 1 can happen also from other reasons (see the next example), but one possibility is that parameters have similar role in the model. Here the reason is the constant column in x, which we put there for the demonstration purposes. We may have constant column also if the predictor matrix is augmented unnecessarily with the intercept predictor, or if the observed data or subdata used in the specific analysis just happens to have only one unique value.

A posterior with very high correlation

The data are Kilpisjärvi summer month temperatures 1952-2013.

data_kilpis <- read.delim(root("problems","kilpisjarvi-summer-temp.csv"), sep = ";")
data_lin <-list(M=1,
                N = nrow(data_kilpis),
                x = matrix(data_kilpis$year, ncol=1),
                y = data_kilpis[,5])

Plot the data

ggplot() +
  geom_point(aes(x, y), data = data.frame(data_lin), size = 1) +
  labs(y = 'Summer temp. @Kilpisjärvi', x= "Year") +
  guides(linetype = "none")

We use a linear model

code_lin <- root("problems", "linear_glm_kilpis.stan")
writeLines(readLines(code_lin))
// logistic regression
data {
  int<lower=0> N;
  int<lower=0> M;
  vector[N] y;
  matrix[N,M] x;
}
parameters {
  real alpha;
  vector[M] beta;
  real sigma;
}
model {
  alpha ~ normal(0, 100);
  beta ~ normal(0, 100);
  sigma ~ normal(0, 1);
  y ~ normal_id_glm(x, alpha, beta, sigma);
}

Run Stan

mod_lin <- cmdstan_model(stan_file = code_lin)
fit_lin <- mod_lin$sample(data = data_lin, seed = SEED, refresh = 0)
Running MCMC with 4 sequential chains...

Chain 1 finished in 1.3 seconds.
Chain 2 finished in 1.2 seconds.
Chain 3 finished in 1.1 seconds.
Chain 4 finished in 1.2 seconds.

All 4 chains finished successfully.
Mean chain execution time: 1.2 seconds.
Total execution time: 5.2 seconds.

Stan gives a warning: There were X transitions after warmup that exceeded the maximum treedepth.

We can check other diagnostics as follows

fit_lin$cmdstan_diagnose()
Processing csv files: /tmp/RtmpoIWO9X/linear_glm_kilpis-202110211822-1-a28d21.csv, /tmp/RtmpoIWO9X/linear_glm_kilpis-202110211822-2-a28d21.csv, /tmp/RtmpoIWO9X/linear_glm_kilpis-202110211822-3-a28d21.csv, /tmp/RtmpoIWO9X/linear_glm_kilpis-202110211822-4-a28d21.csv

Checking sampler transitions treedepth.
16 of 4000 (0.4%) transitions hit the maximum treedepth limit of 10, or 2^10 leapfrog steps.
Trajectories that are prematurely terminated due to this limit will result in slow exploration.
For optimal performance, increase this limit.

Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory for all transitions.

Effective sample size satisfactory.

Split R-hat values satisfactory all parameters.

Processing complete.

We check \(\widehat{R}\) end ESS values, which are fine.

draws <- as_draws_rvars(fit_lin$draws())
summarize_draws(draws)
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
lp__ -38.50 -38.18 1.25 1.02 -40.89 -37.15 1 1114 1649
alpha -30.86 -30.79 15.06 14.45 -55.47 -5.57 1 1185 1337
beta 0.02 0.02 0.01 0.01 0.01 0.03 1 1184 1337
sigma 1.12 1.11 0.10 0.10 0.97 1.30 1 1350 1287

Plots

mcmc_pairs(as_draws_array(draws), pars=c("alpha","beta"))

And there it is: alpha and beta are super-correlated. Here the reason is that the x values are in the range 1952–2013, and the inrecept alpha denotes the temperature at year 0 which is very far away from the data. If the intercept alpha changes, the slope beta needs to change too. The high correlation makes the inference slower, and we can make it faster by centering x.

Here we simply subtract 1982.5 from the year, so that the mean of x is 0. We could also include the centering and back transformation to Stan code.

data_lin <-list(M=1,
                N = nrow(data_kilpis),
                x = matrix(data_kilpis$year-1982.5, ncol=1),
                y = data_kilpis[,5])

fit_lin <- mod_lin$sample(data = data_lin, seed = SEED, refresh = 0)
Running MCMC with 4 sequential chains...

Chain 1 finished in 0.0 seconds.
Chain 2 finished in 0.0 seconds.
Chain 3 finished in 0.0 seconds.
Chain 4 finished in 0.0 seconds.

All 4 chains finished successfully.
Mean chain execution time: 0.0 seconds.
Total execution time: 0.5 seconds.

Now treedepth exceedence warnings.

We can check other diagnostics as follows

fit_lin$cmdstan_diagnose()
Processing csv files: /tmp/RtmpoIWO9X/linear_glm_kilpis-202110211822-1-809ee4.csv, /tmp/RtmpoIWO9X/linear_glm_kilpis-202110211822-2-809ee4.csv, /tmp/RtmpoIWO9X/linear_glm_kilpis-202110211822-3-809ee4.csv, /tmp/RtmpoIWO9X/linear_glm_kilpis-202110211822-4-809ee4.csv

Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.

Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory for all transitions.

Effective sample size satisfactory.

Split R-hat values satisfactory all parameters.

Processing complete, no problems detected.

We check \(\widehat{R}\) end ESS values, which are now even better.

draws <- as_draws_rvars(fit_lin$draws())
summarize_draws(draws)
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
lp__ -38.55 -38.21 1.32 1.06 -41.14 -37.11 1 1915 2466
alpha 9.31 9.31 0.15 0.14 9.07 9.56 1 3681 2701
beta 0.02 0.02 0.01 0.01 0.01 0.03 1 3927 2766
sigma 1.12 1.11 0.11 0.10 0.96 1.31 1 3168 2454

Plots

mcmc_pairs(as_draws_array(draws), pars=c("alpha","beta"))

The posterior dependency has disappeared and the interpretation of alpha is the average temperature over all the observed years.

A bimodal posterior

A toy example of bimodal distribution. Bimodal distributions can arise from many reasons as in mixture models or models with non-log-concave likelihoods or priors (ie with thick tails).

Data

Bimodally distributed data

N=20
y=c(rnorm(N/2, mean=-5, sd=1),rnorm(N/2, mean=5, sd=1));
data_tt <-list(N = N, y = y)

Student’s t model

code_tt <- root("problems", "student.stan")
writeLines(readLines(code_tt))
// student-student model
data {
  int<lower=0> N;
  vector[N] y;
}
parameters {
  real mu;
}
model {
  mu ~ student_t(4, 0, 100);
  y ~ student_t(4, mu, 1);
}

Sample

mod_tt <- cmdstan_model(stan_file = code_tt)
fit_tt <- mod_tt$sample(data = data_tt, seed = SEED, refresh = 0)
Running MCMC with 4 sequential chains...

Chain 1 finished in 0.0 seconds.
Chain 2 finished in 0.0 seconds.
Chain 3 finished in 0.0 seconds.
Chain 4 finished in 0.0 seconds.

All 4 chains finished successfully.
Mean chain execution time: 0.0 seconds.
Total execution time: 0.5 seconds.

Convergence diagnostics

There are convergence issues reported by sampling. We can also explicitly call CmdStan inference diagnostics:

fit_tt$cmdstan_diagnose()
Processing csv files: /tmp/RtmpoIWO9X/student-202110211823-1-60702b.csv, /tmp/RtmpoIWO9X/student-202110211823-2-60702b.csv, /tmp/RtmpoIWO9X/student-202110211823-3-60702b.csv, /tmp/RtmpoIWO9X/student-202110211823-4-60702b.csv

Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.

Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory for all transitions.

The following parameters had fewer than 0.001 effective draws per transition:
  mu
Such low values indicate that the effective sample size estimators may be biased high and actual performance may be substantially lower than quoted.

The following parameters had split R-hat greater than 1.05:
  mu
Such high values indicate incomplete mixing and biasedestimation.
You should consider regularizating your model with additional prior information or a more effective parameterization.

Processing complete.

High Rhat and very low ESS We check \(\widehat{R}\) end ESS values.

draws <- as_draws_rvars(fit_tt$draws())
summarize_draws(draws)
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
lp__ -85.21 -84.92 0.72 0.30 -86.76 -84.63 1.04 88 NA
mu -2.42 -4.51 3.91 0.62 -5.23 4.63 1.53 7 NA

Histogram shows two modes

mcmc_hist(as_draws_array(draws), pars=c("mu"))

Trace plot shows that the chains are not mixing between the modes

mcmc_trace(as_draws_array(draws), pars=c("mu"))

Easy bimodal posterior

The same example, but with this data, the modes are close enough that it’s easy for MCMC to jump from one mode to another.

N=20
y=c(rnorm(N/2, mean=-3, sd=1),rnorm(N/2, mean=3, sd=1));
data_tt <-list(N = N, y = y)

fit_tt <- mod_tt$sample(data = data_tt, seed = SEED, refresh = 0)
Running MCMC with 4 sequential chains...

Chain 1 finished in 0.0 seconds.
Chain 2 finished in 0.0 seconds.
Chain 3 finished in 0.0 seconds.
Chain 4 finished in 0.0 seconds.

All 4 chains finished successfully.
Mean chain execution time: 0.0 seconds.
Total execution time: 0.5 seconds.

Convergence diagnostics

There were no convergence issues reported by sampling. We can also explicitly call CmdStan inference diagnostics:

fit_tt$cmdstan_diagnose()
Processing csv files: /tmp/RtmpoIWO9X/student-202110211823-1-41fb2f.csv, /tmp/RtmpoIWO9X/student-202110211823-2-41fb2f.csv, /tmp/RtmpoIWO9X/student-202110211823-3-41fb2f.csv, /tmp/RtmpoIWO9X/student-202110211823-4-41fb2f.csv

Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.

Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory for all transitions.

Effective sample size satisfactory.

Split R-hat values satisfactory all parameters.

Processing complete, no problems detected.

We check \(\widehat{R}\) end ESS values, which in this case all look good.

draws <- as_draws_rvars(fit_tt$draws())
summarize_draws(draws)
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
lp__ -53.69 -53.65 0.44 0.20 -54.45 -53.27 1 1161 2051
mu -0.42 -0.56 1.14 1.39 -2.04 1.48 1 646 1566

Two modes are visible

mcmc_hist(as_draws_array(draws), pars=c("mu"))

Trace plot is not very useful. It shows the chains are jumping between modes, but it’s difficult to see whether the jumps happen often enough and chains are mixing well.

mcmc_trace(as_draws_array(draws), pars=c("mu"))

Rank histogram plot

mcmc_rank_hist(as_draws_array(draws), pars=c("mu"))

Rank ECDF plot

Add here

Initial value issues

MCMC requires some initial values. By default Stan generates them randomly from [-2,2] (in unconstrained space). Sometimes these initial values can be dbad and cause numerical issues. Computers, in general, use finite number of bits to present numbers and with very small or large numbers, there can be problems of presenting them or there can be significant loss of accuracy.

The data is generated from a Poisson regression model. The Poisson intensity parameter has to be positive and usually the latent linear predictor is exponentiated to be positive (the exponentiation can also be justified by multiplicative effects on Poisson intensity).

set.seed(SEED)
M=1;
N=20;
x=1e3*matrix(c(sort(rnorm(N))),ncol=M)
y=rpois(N,exp(1e-3*x[,1]))
data_pois <-list(M = M, N = N, x = x, y = y)
ggplot() +
  geom_point(aes(x, y), data = data.frame(data_pois), size = 3)

Poisson regression model with proper priors

code_pois <- root("problems", "pois_glm.stan")
writeLines(readLines(code_pois))
// Poisson regression
data {
  int<lower=0> N;
  int<lower=0> M;
  int<lower=0> y[N];
  matrix[N,M] x;
}
parameters {
  real alpha;
  vector[M] beta;
}
model {
  alpha ~ normal(0,10);
  beta ~ normal(0,10);
  y ~ poisson_log_glm(x, alpha, beta);
}

Sample

mod_pois <- cmdstan_model(stan_file = code_pois)
fit_pois <- mod_pois$sample(data = data_pois, seed = SEED, refresh = 0)
Running MCMC with 4 sequential chains...

Chain 1 finished in 1.1 seconds.
Chain 2 finished in 0.2 seconds.
Chain 3 finished in 0.0 seconds.
Chain 4 finished in 0.0 seconds.

All 4 chains finished successfully.
Mean chain execution time: 0.3 seconds.
Total execution time: 1.7 seconds.

We get a lot of warnings. Uh, they show in console, but not in the notebook!

Chain 4 Rejecting initial value:
Chain 4   Log probability evaluates to log(0), i.e. negative infinity.
Chain 4   Stan can't start sampling from this initial value.

Convergence diagnostics

There are convergence issues reported by sampling. We can also explicitly call CmdStan inference diagnostics:

fit_pois$cmdstan_diagnose()
Processing csv files: /tmp/RtmpoIWO9X/pois_glm-202110211823-1-230c83.csv, /tmp/RtmpoIWO9X/pois_glm-202110211823-2-230c83.csv, /tmp/RtmpoIWO9X/pois_glm-202110211823-3-230c83.csv, /tmp/RtmpoIWO9X/pois_glm-202110211823-4-230c83.csv

Checking sampler transitions treedepth.
592 of 4000 (15%) transitions hit the maximum treedepth limit of 10, or 2^10 leapfrog steps.
Trajectories that are prematurely terminated due to this limit will result in slow exploration.
For optimal performance, increase this limit.

Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
The E-BFMI, 0.0057, is below the nominal threshold of 0.3 which suggests that HMC may have trouble exploring the target distribution.
If possible, try to reparameterize the model.

The following parameters had fewer than 0.001 effective draws per transition:
  alpha, beta[1]
Such low values indicate that the effective sample size estimators may be biased high and actual performance may be substantially lower than quoted.

The following parameters had split R-hat greater than 1.1:
  alpha, beta[1]
Such high values indicate incomplete mixing and biasedestimation.
You should consider regularizating your model with additional prior information or a more effective parameterization.

Processing complete.

We check \(\widehat{R}\) end ESS values

draws <- as_draws_rvars(fit_pois$draws())
summarize_draws(draws)
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
lp__ -9.023224e+80 -1.859e+35 1.986335e+81 2.756153e+35 -5.320882e+81 9.99 NA 5 NA
alpha -1.400000e-01 -9.000e-02 6.500000e-01 1.020000e+00 -1.080000e+00 0.70 NA NA NA
beta -2.000000e-02 0.000e+00 5.000000e-02 3.000000e-02 -1.100000e-01 0.04 3.23 4 NA

Plot

mcmc_pairs(as_draws_array(draws), pars=c("alpha","beta"))

The reason for the issue is that the initial values for beta is sampled from [-2, 2] and x has some large values. If the initial value for beta is higher than about 0.3 or lower than -0.4, some of the values of exp(alpha + beta * x) will overflow to Inf.

In this case the problem is alleviated by scaling the x

data_pois <-list(M = M, N = N, x = x/1e3, y = y)
ggplot() +
  geom_point(aes(x, y), data = data.frame(data_pois), size = 3)

Poisson regression model with proper priors

code_pois <- root("problems", "pois_glm.stan")
writeLines(readLines(code_pois))
// Poisson regression
data {
  int<lower=0> N;
  int<lower=0> M;
  int<lower=0> y[N];
  matrix[N,M] x;
}
parameters {
  real alpha;
  vector[M] beta;
}
model {
  alpha ~ normal(0,10);
  beta ~ normal(0,10);
  y ~ poisson_log_glm(x, alpha, beta);
}

Sample

mod_pois <- cmdstan_model(stan_file = code_pois)
fit_pois <- mod_pois$sample(data = data_pois, seed = SEED, refresh = 0)
Running MCMC with 4 sequential chains...

Chain 1 finished in 0.0 seconds.
Chain 2 finished in 0.0 seconds.
Chain 3 finished in 0.0 seconds.
Chain 4 finished in 0.0 seconds.

All 4 chains finished successfully.
Mean chain execution time: 0.0 seconds.
Total execution time: 0.5 seconds.

We get a lot of warnings

Chain 4 Rejecting initial value: Chain 4 Log probability evaluates to log(0), i.e. negative infinity. Chain 4 Stan can’t start sampling from this initial value.

Convergence diagnostics

There are convergence issues reported by sampling. We can also explicitly call CmdStan inference diagnostics:

fit_pois$cmdstan_diagnose()
Processing csv files: /tmp/RtmpoIWO9X/pois_glm-202110211823-1-1b0c9a.csv, /tmp/RtmpoIWO9X/pois_glm-202110211823-2-1b0c9a.csv, /tmp/RtmpoIWO9X/pois_glm-202110211823-3-1b0c9a.csv, /tmp/RtmpoIWO9X/pois_glm-202110211823-4-1b0c9a.csv

Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.

Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory for all transitions.

Effective sample size satisfactory.

Split R-hat values satisfactory all parameters.

Processing complete, no problems detected.

We check \(\widehat{R}\) end ESS values

draws <- as_draws_rvars(fit_pois$draws())
summarize_draws(draws)
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
lp__ 9.09 9.38 0.99 0.71 7.16 10.03 1 1434 1804
alpha -0.05 -0.04 0.26 0.26 -0.50 0.37 1 1091 1154
beta 1.01 1.01 0.18 0.18 0.73 1.31 1 1151 1376

Plot

mcmc_pairs(as_draws_array(draws), pars=c("alpha","beta"))

Everything works fine.

It can be sometimes difficult to find a good initial values.

If the initial value warning comes only once, it is possible that MCMC was able to escape the bad region and rest of the inference is ok.

We expect Pathfinder to help with initial values.

Thick tailed posterior

A simple Bernoulli regression with proper but thick tailed prior (Cauchy)

code_logit4 <- root("problems", "logit_glm4.stan")
writeLines(readLines(code_logit4))
// logistic regression
data {
  int<lower=0> N;
  int<lower=0> M;
  int<lower=0,upper=1> y[N];
  matrix[N,M] x;
}
parameters {
  real alpha;
  vector[M] beta;
}
model {
  alpha ~ cauchy(0, 10);
  beta ~ cauchy(0, 10);
  y ~ bernoulli_logit_glm(x, alpha, beta);
}

Sample

mod_logit4 <- cmdstan_model(stan_file = code_logit4)
fit_logit4 <- mod_logit4$sample(data = data_logit, seed = SEED, refresh = 0)
Running MCMC with 4 sequential chains...

Chain 1 finished in 0.0 seconds.
Chain 2 finished in 0.1 seconds.
Chain 3 finished in 0.1 seconds.
Chain 4 finished in 0.0 seconds.

All 4 chains finished successfully.
Mean chain execution time: 0.1 seconds.
Total execution time: 0.5 seconds.

Convergence diagnostics

There were no convergence issues reported by sampling. We can also explicitly call CmdStan inference diagnostics:

fit_logit4$cmdstan_diagnose()
Processing csv files: /tmp/RtmpoIWO9X/logit_glm4-202110211823-1-159f48.csv, /tmp/RtmpoIWO9X/logit_glm4-202110211823-2-159f48.csv, /tmp/RtmpoIWO9X/logit_glm4-202110211823-3-159f48.csv, /tmp/RtmpoIWO9X/logit_glm4-202110211823-4-159f48.csv

Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.

Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory for all transitions.

Effective sample size satisfactory.

Split R-hat values satisfactory all parameters.

Processing complete, no problems detected.

We check \(\widehat{R}\) end ESS values, which in this case all look good.

draws <- as_draws_rvars(fit_logit4$draws())
summarize_draws(draws)
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
lp__ -3.86 -3.28 1.98 1.43 -7.83 -1.92 1.01 225 138
alpha 15.59 10.23 22.65 6.82 2.70 43.35 1.02 187 145
beta 37.28 24.63 50.84 16.58 7.83 104.13 1.01 189 NA

ESSs are not that big, but not really alarming Plot

mcmc_pairs(as_draws_array(draws), pars=c("alpha","beta"))

Tail is long Rank-plots

mcmc_rank_hist(as_draws_array(draws), pars=c("alpha"))

Histograms are not really uniform.

Variance parameter that is not constrained to be positive

Demonstration what happens if we forget <lower=0> from a parameter that has to be positive.

Data

We simulated x and y independently from normal distributions. As N=8 is small, there will be lot of uncertainty about the parameters including the scale sigma.

M=1;
N=8;
set.seed(SEED)
x=matrix(rnorm(N),ncol=M)
y=rnorm(N)/10
data_lin <-list(M = M, N = N, x = x, y = y)

We use linear regression model with proper priors.

code_lin <- root("problems", "linear_glm.stan")
writeLines(readLines(code_lin))
// logistic regression
data {
  int<lower=0> N;
  int<lower=0> M;
  vector[N] y;
  matrix[N,M] x;
}
parameters {
  real alpha;
  vector[M] beta;
  real sigma;
}
model {
  alpha ~ normal(0, 1);
  beta ~ normal(0, 1);
  sigma ~ normal(0, 1);
  y ~ normal_id_glm(x, alpha, beta, sigma);
}

Sample

mod_lin <- cmdstan_model(stan_file = code_lin)
fit_lin <- mod_lin$sample(data = data_lin, seed = SEED, refresh = 0)
Running MCMC with 4 sequential chains...

Chain 1 finished in 0.0 seconds.
Chain 2 finished in 0.0 seconds.
Chain 3 finished in 0.0 seconds.
Chain 4 finished in 0.0 seconds.

All 4 chains finished successfully.
Mean chain execution time: 0.0 seconds.
Total execution time: 0.5 seconds.

We get many times the following warnings

Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 4 Exception: normal_id_glm_lpdf: Scale vector is -0.747476, but must be positive finite! (in '/tmp/RtmprEP4gg/model-7caa12ce8e405.stan', line 16, column 2 to column 43)
Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Sometimes these happen in early phase, even if the model has been correctly defined, but now we have too many of them, which indicates the samples is trying to jump to infeasible values, which here means the negative scale parameter values. Many rejections may lead to biased estimates.

Convergence diagnostics

There are some divergences reported, which is not necessarily alarming, but in this case are likely to be related to sub-optimal stepsize adaptation due to the many rejections. We can also explicitly call CmdStan inference diagnostics:

fit_lin$cmdstan_diagnose()
Processing csv files: /tmp/RtmpoIWO9X/linear_glm-202110211823-1-7310bb.csv, /tmp/RtmpoIWO9X/linear_glm-202110211823-2-7310bb.csv, /tmp/RtmpoIWO9X/linear_glm-202110211823-3-7310bb.csv, /tmp/RtmpoIWO9X/linear_glm-202110211823-4-7310bb.csv

Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.

Checking sampler transitions for divergences.
22 of 4000 (0.55%) transitions ended with a divergence.
These divergent transitions indicate that HMC is not fully able to explore the posterior distribution.
Try increasing adapt delta closer to 1.
If this doesn't remove all divergences, try to reparameterize the model.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory for all transitions.

Effective sample size satisfactory.

Split R-hat values satisfactory all parameters.

Processing complete.

We check \(\widehat{R}\) end ESS values, which in this case are ok.

draws <- as_draws_rvars(fit_lin$draws())
summarize_draws(draws)
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
lp__ 18.09 18.63 1.96 1.53 13.94 20.17 1 765 770
alpha 0.05 0.05 0.03 0.02 0.01 0.10 1 1602 1421
beta 0.04 0.04 0.02 0.02 0.00 0.07 1 1455 1164
sigma 0.07 0.06 0.03 0.02 0.04 0.13 1 1100 801

Plots

mcmc_hist(as_draws_array(draws), pars=c("sigma"))+xlim(c(0,0.31))

Fixed model inlcudes <lower=0> constraint for sigma.

code_lin2 <- root("problems", "linear_glm2.stan")
writeLines(readLines(code_lin2))
// logistic regression
data {
  int<lower=0> N;
  int<lower=0> M;
  vector[N] y;
  matrix[N,M] x;
}
parameters {
  real alpha;
  vector[M] beta;
  real<lower=0> sigma;
}
model {
  alpha ~ normal(0, 1);
  beta ~ normal(0, 1);
  sigma ~ normal(0, 1);
  y ~ normal_id_glm(x, alpha, beta, sigma);
}

Sample

mod_lin2 <- cmdstan_model(stan_file = code_lin2)
fit_lin2 <- mod_lin2$sample(data = data_lin, seed = SEED, refresh = 0)
Running MCMC with 4 sequential chains...

Chain 1 finished in 0.0 seconds.
Chain 2 finished in 0.0 seconds.
Chain 3 finished in 0.0 seconds.
Chain 4 finished in 0.0 seconds.

All 4 chains finished successfully.
Mean chain execution time: 0.0 seconds.
Total execution time: 0.5 seconds.

No sampling warnings We check \(\widehat{R}\) end ESS values, whic are ok.

draws2 <- as_draws_rvars(fit_lin2$draws())
summarize_draws(draws2)
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
lp__ 15.51 15.93 1.59 1.30 12.42 17.24 1 1130 1507
alpha 0.05 0.05 0.03 0.02 0.00 0.09 1 1891 1749
beta 0.04 0.04 0.02 0.02 0.00 0.07 1 1964 1752
sigma 0.07 0.06 0.03 0.02 0.04 0.12 1 1415 1822

Plots

mcmc_hist(as_draws_array(draws), pars=c("sigma"))+xlim(c(0,0.31))

In this specific case the bias is negliglible, but the sampling with the proper constraint is more efficient.

#' ---
#' title: "Illustration of simple problematic posteriors"
#' author: "Aki Vehtari"
#' date: "First version 2021-06-10. Last modified `r format(Sys.Date())`."
#' output:
#'   html_document:
#'     theme: readable
#'     toc: true
#'     toc_depth: 2
#'     toc_float: true
#'     code_download: true
#' ---


#' Demonstration of simple problematic distributions and how to
#' interpret the diagnostics.
#' 

#+ setup, include=FALSE
knitr::opts_chunk$set(message=FALSE, error=FALSE, warning=FALSE, comment=NA, cache=FALSE)

#' #### Load packages
library("rprojroot")
root<-has_file(".Workflow-Examples-root")$make_fix_file()
library(cmdstanr) 
library(posterior)
options(pillar.neg = FALSE, pillar.subtle=FALSE, pillar.sigfig=2)
library(lemon)
library(tidyr) 
library(dplyr) 
library(ggplot2)
library(bayesplot)
theme_set(bayesplot::theme_default(base_family = "sans", base_size=16))
set1 <- RColorBrewer::brewer.pal(7, "Set1")
SEED <- 48927 # set random seed for reproducibility

#'
#' ## Improper posterior
#' 
#' Unbounded likelihood without proper prior leading to improper
#' posterior
#'
#' ### Data
#'
#' Univariate continous x, binary y, and the two classes are
#' completely separable, which leads to unbounded likelihood.
set.seed(SEED+4)
M=1;
N=10;
x=matrix(sort(rnorm(N)),ncol=M)
y=rep(c(0,1), each=N/2)
data_logit <-list(M = M, N = N, x = x, y = y)
ggplot() +
  geom_point(aes(x, y), data = data.frame(data_logit), size = 3)+
  scale_y_continuous(breaks=c(0,1))

#'
#' ### Model
#'
#' A simple Bernoulli regression (where we have forgotten to include priors)
code_logit <- root("problems", "logit_glm.stan")
writeLines(readLines(code_logit))
#' Sample
mod_logit <- cmdstan_model(stan_file = code_logit)
fit_logit <- mod_logit$sample(data = data_logit, seed = SEED, refresh = 0)

#'
#' ### Convergence diagnostics
#' 
#' There are convergence issues reported by sampling. We can also
#' explicitly call CmdStan inference diagnostics:
fit_logit$cmdstan_diagnose()

#' We check $\widehat{R}$ end ESS values
draws <- as_draws_rvars(fit_logit$draws())
#+ render=lemon_print, digits=c(0,2,2,2,2,2,2,2,0,0)
summarize_draws(draws)

#' Plot
mcmc_pairs(as_draws_array(draws), pars=c("alpha","beta"))

#' Quite clear case
#' 
#' ### A fixed model with proper priors
#'
#' A simple Bernoulli regression with proper prior
code_logit2 <- root("problems", "logit_glm2.stan")
writeLines(readLines(code_logit2))
#' Sample
mod_logit2 <- cmdstan_model(stan_file = code_logit2)
fit_logit2 <- mod_logit2$sample(data = data_logit, seed = SEED, refresh = 0)

#'
#' ### Convergence diagnostics
#' 
#' There were no convergence issues reported by sampling. We can also
#' explicitly call CmdStan inference diagnostics:
fit_logit2$cmdstan_diagnose()

#' We check $\widehat{R}$ end ESS values, which in this case all look good.
draws <- as_draws_rvars(fit_logit2$draws())
#+ render=lemon_print, digits=c(0,2,2,2,2,2,2,2,0,0)
summarize_draws(draws)

#' Plot
mcmc_pairs(as_draws_array(draws), pars=c("alpha","beta"))

#' No problems
#' 

#'
#' ## A model with unused parameter
#' 
#' A simple Bernoulli regression with proper prior (but we have
#' forgotten to remove unused parameter declaration)
code_logit3 <- root("problems", "logit_glm3.stan")
writeLines(readLines(code_logit3))
#' Sample
mod_logit3 <- cmdstan_model(stan_file = code_logit3)
fit_logit3 <- mod_logit3$sample(data = data_logit, seed = SEED, refresh = 0)

#'
#' ### Convergence diagnostics
#' 
#' There are convergence issues reported by sampling. We can also
#' explicitly call CmdStan inference diagnostics:
fit_logit3$cmdstan_diagnose()

#' We check $\widehat{R}$ end ESS values
draws <- as_draws_rvars(fit_logit3$draws())
#+ render=lemon_print, digits=c(0,2,2,2,2,2,2,2,0,0)
summarize_draws(draws)

#' Plots
mcmc_pairs(as_draws_array(draws), pars=c("alpha","beta","gamma"))

#' A case where trace plot is actually useful
mcmc_trace(as_draws_array(draws), pars=c("gamma"))

#'
#' ## A posterior with two parameters competing
#'
#' ### Data
#'
#' We add another column to the previous data matrix. Sometimes data
#' matrix is augmented with a column of 1's, to present the intercept
#' effect, but in this case that is redundant as our model has
#' explicit intercept term `alpha`.
M=2;
N=1000;
x=matrix(c(rep(1,N),sort(rnorm(N))),ncol=M)
y=((x[,1]+rnorm(N)/2)>0)+0
data_logit4 <-list(M = M, N = N, x = x, y = y)

#' We use the previous Bernoulli regression model with proper priors.
code_logit2 <- root("problems", "logit_glm2.stan")
writeLines(readLines(code_logit2))
#' Sample
mod_logit4 <- cmdstan_model(stan_file = code_logit2)
fit_logit4 <- mod_logit4$sample(data = data_logit4, seed = SEED, refresh = 0)

#' The computation time per chain with the original x with just one
#' column was less than 0.1s per chain. Now the computation time per
#' chain is several seconds, which is suspicious.

#'
#' ### Convergence diagnostics
#' 
#' There were no convergence issues reported by sampling. We can also
#' explicitly call CmdStan inference diagnostics:
fit_logit4$cmdstan_diagnose()

#' We check $\widehat{R}$ end ESS values, which in this case are ok,
#' but ESS's are lower than what we would expect from Stan for such a
#' lower dimensional problem.
draws <- as_draws_rvars(fit_logit4$draws())
#+ render=lemon_print, digits=c(0,2,2,2,2,2,2,2,0,0)
summarize_draws(draws)

#' Plots
mcmc_pairs(as_draws_array(draws), pars=c("alpha","beta[1]","beta[2]"))

#' And there it is: `alpha` and `beta[1]` are super-correlated.
#' We can compute the correlation.
cor(as_draws_matrix(draws)[,c("alpha","beta[1]")])[1,2]

#' The correlation close to 1 can happen also from other reasons (see
#' the next example), but one possibility is that parameters have
#' similar role in the model. Here the reason is the constant column
#' in x, which we put there for the demonstration purposes. We may
#' have constant column also if the predictor matrix is augmented
#' unnecessarily with the intercept predictor, or if the observed data
#' or subdata used in the specific analysis just happens to have only
#' one unique value.
#' 

#'
#' ## A posterior with very high correlation
#'
#' The data are Kilpisjärvi summer month temperatures 1952-2013.
data_kilpis <- read.delim(root("problems","kilpisjarvi-summer-temp.csv"), sep = ";")
data_lin <-list(M=1,
                N = nrow(data_kilpis),
                x = matrix(data_kilpis$year, ncol=1),
                y = data_kilpis[,5])

#' Plot the data
ggplot() +
  geom_point(aes(x, y), data = data.frame(data_lin), size = 1) +
  labs(y = 'Summer temp. @Kilpisjärvi', x= "Year") +
  guides(linetype = "none")

#' We use a linear model
code_lin <- root("problems", "linear_glm_kilpis.stan")
writeLines(readLines(code_lin))

#' Run Stan
mod_lin <- cmdstan_model(stan_file = code_lin)
fit_lin <- mod_lin$sample(data = data_lin, seed = SEED, refresh = 0)

#' Stan gives a warning: There were X transitions after warmup that exceeded the maximum treedepth. 
#' 
#' We can check other diagnostics as follows
fit_lin$cmdstan_diagnose()

#' We check $\widehat{R}$ end ESS values, which are fine.
draws <- as_draws_rvars(fit_lin$draws())
#+ render=lemon_print, digits=c(0,2,2,2,2,2,2,2,0,0)
summarize_draws(draws)

#' Plots
mcmc_pairs(as_draws_array(draws), pars=c("alpha","beta"))

#' And there it is: `alpha` and `beta` are super-correlated.  Here the
#' reason is that the x values are in the range 1952--2013, and the
#' inrecept alpha denotes the temperature at year 0 which is very far
#' away from the data. If the intercept alpha changes, the slope beta
#' needs to change too. The high correlation makes the inference
#' slower, and we can make it faster by centering x.
#'
#' Here we simply subtract 1982.5 from the year, so that the mean of x
#' is 0. We could also include the centering and back transformation
#' to Stan code.
data_lin <-list(M=1,
                N = nrow(data_kilpis),
                x = matrix(data_kilpis$year-1982.5, ncol=1),
                y = data_kilpis[,5])

fit_lin <- mod_lin$sample(data = data_lin, seed = SEED, refresh = 0)

#' Now treedepth exceedence warnings.
#' 
#' We can check other diagnostics as follows
fit_lin$cmdstan_diagnose()

#' We check $\widehat{R}$ end ESS values, which are now even better.
draws <- as_draws_rvars(fit_lin$draws())
#+ render=lemon_print, digits=c(0,2,2,2,2,2,2,2,0,0)
summarize_draws(draws)

#' Plots
mcmc_pairs(as_draws_array(draws), pars=c("alpha","beta"))

#'
#' The posterior dependency has disappeared and the interpretation of
#' alpha is the average temperature over all the observed years.
#' 

#'
#' ## A bimodal posterior
#'
#' A toy example of bimodal distribution. Bimodal distributions can
#' arise from many reasons as in mixture models or models with
#' non-log-concave likelihoods or priors (ie with thick tails).
#' 
#' ### Data
#'
#' Bimodally distributed data
N=20
y=c(rnorm(N/2, mean=-5, sd=1),rnorm(N/2, mean=5, sd=1));
data_tt <-list(N = N, y = y)

#' Student's t model
code_tt <- root("problems", "student.stan")
writeLines(readLines(code_tt))
#' Sample
mod_tt <- cmdstan_model(stan_file = code_tt)
fit_tt <- mod_tt$sample(data = data_tt, seed = SEED, refresh = 0)

#'
#' ### Convergence diagnostics
#' 
#' There are convergence issues reported by sampling. We can also
#' explicitly call CmdStan inference diagnostics:
fit_tt$cmdstan_diagnose()

#' High Rhat and very low ESS

#' We check $\widehat{R}$ end ESS values.
draws <- as_draws_rvars(fit_tt$draws())
#+ render=lemon_print, digits=c(0,2,2,2,2,2,2,2,0,0)
summarize_draws(draws)

#' Histogram shows two modes
mcmc_hist(as_draws_array(draws), pars=c("mu"))

#' Trace plot shows that the chains are not mixing between the modes
mcmc_trace(as_draws_array(draws), pars=c("mu"))

#' ### Easy bimodal posterior
#'
#' The same example, but with this data, the modes are close enough
#' that it's easy for MCMC to jump from one mode to another.
N=20
y=c(rnorm(N/2, mean=-3, sd=1),rnorm(N/2, mean=3, sd=1));
data_tt <-list(N = N, y = y)

fit_tt <- mod_tt$sample(data = data_tt, seed = SEED, refresh = 0)

#' ### Convergence diagnostics
#' 
#' There were no convergence issues reported by sampling. We can also
#' explicitly call CmdStan inference diagnostics:
fit_tt$cmdstan_diagnose()

#' We check $\widehat{R}$ end ESS values, which in this case all look good.
draws <- as_draws_rvars(fit_tt$draws())
#+ render=lemon_print, digits=c(0,2,2,2,2,2,2,2,0,0)
summarize_draws(draws)

#' Two modes are visible
mcmc_hist(as_draws_array(draws), pars=c("mu"))

#' Trace plot is not very useful. It shows the chains are jumping
#' between modes, but it's difficult to see whether the jumps happen
#' often enough and chains are mixing well.
mcmc_trace(as_draws_array(draws), pars=c("mu"))

#' Rank histogram plot
mcmc_rank_hist(as_draws_array(draws), pars=c("mu"))

#' Rank ECDF plot
#'
#' **Add here**

#'
#' ## Initial value issues
#'
#' MCMC requires some initial values. By default Stan generates them
#' randomly from [-2,2] (in unconstrained space). Sometimes these
#' initial values can be dbad and cause numerical issues. Computers,
#' in general, use finite number of bits to present numbers and with
#' very small or large numbers, there can be problems of presenting
#' them or there can be significant loss of accuracy.
#'
#' The data is generated from a Poisson regression model. The Poisson
#' intensity parameter has to be positive and usually the latent
#' linear predictor is exponentiated to be positive (the
#' exponentiation can also be justified by multiplicative effects on
#' Poisson intensity).
set.seed(SEED)
M=1;
N=20;
x=1e3*matrix(c(sort(rnorm(N))),ncol=M)
y=rpois(N,exp(1e-3*x[,1]))
data_pois <-list(M = M, N = N, x = x, y = y)
ggplot() +
  geom_point(aes(x, y), data = data.frame(data_pois), size = 3)

#' Poisson regression model with proper priors
code_pois <- root("problems", "pois_glm.stan")
writeLines(readLines(code_pois))
#' Sample
mod_pois <- cmdstan_model(stan_file = code_pois)
fit_pois <- mod_pois$sample(data = data_pois, seed = SEED, refresh = 0)

#' We get a lot of warnings. Uh, they show in console, but not in the notebook!
#'
#'```
#' Chain 4 Rejecting initial value:
#' Chain 4   Log probability evaluates to log(0), i.e. negative infinity.
#' Chain 4   Stan can't start sampling from this initial value.
#'```

#'
#' ### Convergence diagnostics
#' 
#' There are convergence issues reported by sampling. We can also
#' explicitly call CmdStan inference diagnostics:
fit_pois$cmdstan_diagnose()

#' We check $\widehat{R}$ end ESS values
draws <- as_draws_rvars(fit_pois$draws())
#+ render=lemon_print, digits=c(0,2,2,2,2,2,2,2,0,0)
summarize_draws(draws)

#' Plot
mcmc_pairs(as_draws_array(draws), pars=c("alpha","beta"))

#' The reason for the issue is that the initial values for `beta` is
#' sampled from [-2, 2] and x has some large values. If the initial
#' value for `beta` is higher than about 0.3 or lower than -0.4, some
#' of the values of exp(alpha + beta * x) will overflow to `Inf`.
#'
#' In this case the problem is alleviated by scaling the x

data_pois <-list(M = M, N = N, x = x/1e3, y = y)
ggplot() +
  geom_point(aes(x, y), data = data.frame(data_pois), size = 3)

#' Poisson regression model with proper priors
code_pois <- root("problems", "pois_glm.stan")
writeLines(readLines(code_pois))
#' Sample
mod_pois <- cmdstan_model(stan_file = code_pois)
fit_pois <- mod_pois$sample(data = data_pois, seed = SEED, refresh = 0)

#' We get a lot of warnings
#'
#' Chain 4 Rejecting initial value:
#' Chain 4   Log probability evaluates to log(0), i.e. negative infinity.
#' Chain 4   Stan can't start sampling from this initial value.
#'

#'
#' ### Convergence diagnostics
#' 
#' There are convergence issues reported by sampling. We can also
#' explicitly call CmdStan inference diagnostics:
fit_pois$cmdstan_diagnose()

#' We check $\widehat{R}$ end ESS values
draws <- as_draws_rvars(fit_pois$draws())
#+ render=lemon_print, digits=c(0,2,2,2,2,2,2,2,0,0)
summarize_draws(draws)

#' Plot
mcmc_pairs(as_draws_array(draws), pars=c("alpha","beta"))

#' Everything works fine.
#'
#' It can be sometimes difficult to find a good initial values.
#'
#' If the initial value warning comes only once, it is possible that
#' MCMC was able to escape the bad region and rest of the inference is
#' ok.
#'
#' We expect Pathfinder to help with initial values.
#' 

#' ## Thick tailed posterior
#'
#' A simple Bernoulli regression with proper but thick tailed prior (Cauchy)
code_logit4 <- root("problems", "logit_glm4.stan")
writeLines(readLines(code_logit4))
#' Sample
mod_logit4 <- cmdstan_model(stan_file = code_logit4)
fit_logit4 <- mod_logit4$sample(data = data_logit, seed = SEED, refresh = 0)

#'
#' ### Convergence diagnostics
#' 
#' There were no convergence issues reported by sampling. We can also
#' explicitly call CmdStan inference diagnostics:
fit_logit4$cmdstan_diagnose()

#' We check $\widehat{R}$ end ESS values, which in this case all look good.
draws <- as_draws_rvars(fit_logit4$draws())
#+ render=lemon_print, digits=c(0,2,2,2,2,2,2,2,0,0)
summarize_draws(draws)

#' ESSs are not that big, but not really alarming

#' Plot
mcmc_pairs(as_draws_array(draws), pars=c("alpha","beta"))

#' Tail is long

#' Rank-plots
mcmc_rank_hist(as_draws_array(draws), pars=c("alpha"))

#' Histograms are not really uniform.

#'
#' ## Variance parameter that is not constrained to be positive
#'
#' Demonstration what happens if we forget <lower=0> from a parameter
#' that has to be positive.
#' 
#' ### Data
#'
#' We simulated x and y independently from normal distributions. As
#' N=8 is small, there will be lot of uncertainty about the parameters
#' including the scale sigma.
M=1;
N=8;
set.seed(SEED)
x=matrix(rnorm(N),ncol=M)
y=rnorm(N)/10
data_lin <-list(M = M, N = N, x = x, y = y)

#' We use linear regression model with proper priors.
code_lin <- root("problems", "linear_glm.stan")
writeLines(readLines(code_lin))
#' Sample
mod_lin <- cmdstan_model(stan_file = code_lin)
fit_lin <- mod_lin$sample(data = data_lin, seed = SEED, refresh = 0)

#' We get many times the following warnings
#'```
#' Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#' Chain 4 Exception: normal_id_glm_lpdf: Scale vector is -0.747476, but must be positive finite! (in '/tmp/RtmprEP4gg/model-7caa12ce8e405.stan', line 16, column 2 to column 43)
#' Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#' Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#'```
#'
#' Sometimes these happen in early phase, even if the model has been
#' correctly defined, but now we have too many of them, which
#' indicates the samples is trying to jump to infeasible values, which
#' here means the negative scale parameter values. Many rejections may
#' lead to biased estimates.
#'
#' ### Convergence diagnostics
#' 
#' There are some divergences reported, which is not necessarily
#' alarming, but in this case are likely to be related to sub-optimal
#' stepsize adaptation due to the many rejections. We can also
#' explicitly call CmdStan inference diagnostics:
fit_lin$cmdstan_diagnose()

#' We check $\widehat{R}$ end ESS values, which in this case are ok.
draws <- as_draws_rvars(fit_lin$draws())
#+ render=lemon_print, digits=c(0,2,2,2,2,2,2,2,0,0)
summarize_draws(draws)

#' Plots
mcmc_hist(as_draws_array(draws), pars=c("sigma"))+xlim(c(0,0.31))

#' Fixed model inlcudes <lower=0> constraint for sigma.
code_lin2 <- root("problems", "linear_glm2.stan")
writeLines(readLines(code_lin2))
#' Sample
mod_lin2 <- cmdstan_model(stan_file = code_lin2)
fit_lin2 <- mod_lin2$sample(data = data_lin, seed = SEED, refresh = 0)

#' No sampling warnings

#' We check $\widehat{R}$ end ESS values, whic are ok.
draws2 <- as_draws_rvars(fit_lin2$draws())
#+ render=lemon_print, digits=c(0,2,2,2,2,2,2,2,0,0)
summarize_draws(draws2)

#' Plots
mcmc_hist(as_draws_array(draws), pars=c("sigma"))+xlim(c(0,0.31))

#'
#' In this specific case the bias is negliglible, but the sampling
#' with the proper constraint is more efficient.
#' 
